Incremental Linearization with Cdl{tags in Vm{gen Memo 74
نویسنده
چکیده
Incremental processing is a promising way to improve the performance of natural language generation systems. It imposes speci c requirements upon the linearization process and its underlying representation formalism. In this work, we will discuss these requirements and present an extension of the TAG{formalism that has been developed and utilized for the incremental syntactic generator VM{GEN in VERBMOBIL thereby illuminating how it ful ls the demands identi ed. 1 Motivation An incremental component can be characterized as consuming incremental input and producing incremental output in an interleaved fashion. Incremental processing is a promising way to improve e ciency and exibility of a system. For a cascade of incremental components, the working of the involved components may overlap in time. For reasons of this parallelism, the e ciency of the overall system is expected to be improved. Besides, feedback between incremental components can be described as handing over additional input information that can be consumed via the incremental input interface of the receiver. Thereby the cooperation of several components for solving complex tasks is eased. In this work, we focus on the problem of incremental generation. The input for incremental generators is given stepwise, allowing the system to start processing on parts of the `message to be verbalized' before the input is complete. An incremental generator is able to produce partial output that again can be handed over to an `articulation component' in an incremental fashion. This style of processing is especially useful for producing spoken output, since there are strong time constraints resulting from the expectation of the hearer. The incremental syntactic generator VM{GEN has been developed for VERBMOBIL (cf. [FHK+92, Wah93]) for producing english utterances in the context of face{to{face dialog translation. It is based on the system TAG{GEN, the generation component of the system WIP (see [WAF+93]). We have concentrated on examining the demands of incremental processing on the architecture and the syntactic representation formalism of the generator. An incremental style of processing imposes speci c requirements upon the linearization process and its underlying representation formalism. In this work1, we will discuss these requirements and present an extension of the TAG formalism that has been developed for VM{GEN and that ful ls the demands identi ed. 1Thanks to Wolfgang Finkler, Karin Harbusch, and Peter Poller for their helpful comments. 2 The characteristics of an incremental style of generation impose additional requirements on the representation and processing of syntactic structures. Working on the basis of incrementally given input elements in general means to work with incomplete information. Some decisions made on the basis of the data at hand necessitate using assumptions about the outstanding input, thereby reducing the set of input increments that can be consistently integrated into processing. This is the motivation for postponing decisions until there is reliable and su cient input information which is given at the latest when the input has been completely handed over. On the other hand, producing incremental output presupposes that the system commits itself to some decisions that might contradict to input information given later on. The competing goals accuracy and speed necessitate a sophisticated treatment of the WHEN{to{SAY task, that can be de ned as deciding about an adequate timing of the utterance of output increments. Additionally, a system producing incremental output must realize methods to repair previously uttered parts of a sentence in case of unsolvable contradictions. For syntactic generation, two levels of processing can be distinguished that are involved in verbalizing input increments. First, for each added element the hierarchical structure underlying the sentence under construction has to be expanded. Second, new elements have to be positioned in the nal utterance. Obviously, each element that has been integrated into a hierarchical structure in principle can be linearized. As soon as the system produces incremental output it has to x some linearization results in order to utter parts of a sentence. Now it might happen that elements given later on cannot be linearized or integrated into the hierarchical structure any more without retracting some parts of the output produced so far. The system has to inform its addressee about changes by producing `correction messages' that enable the addressee to understand the revision. Since the addressee of a natural language front-end system is a human, strategies to encode `correction messages' have to be adapted to the needs of the human interlocutor. Regarding the possible e ects of incremental processing on syntactic generation, it is essential to choose a syntactic representation formalism that facilitates the dynamic construction of the hierarchical structure and the stepwise linearization of its substructures. Lexicalized Tree Adjoining Grammars (cf. [JLT75, SAJ88]) are not only well suited for the representation of natural language but also provide two combination operations | adjunction and substitution | that allow for di erent kinds of expansions of the syntactic tree. Therefore, they support the integration of new input elements into the hierarchical structure. Unfortunately, the problems of incremental linearization are not solved by actually known extensions of the TAG formalism. A new extension { CDL{TAG { is presented in this work that enables VM{GEN to position elements 3 exibly and facilitates our rst approaches to incremental output and revision. The requirements we stated for the formalism are listed in Section 2. In Section 3, we de ne CDL{TAGs and describe how they ful l the requirements. Section 4 illustrates some algorithms of the Linearization Component of VM{GEN where the speci c features of CDL{TAGs are used. In Section 5 we summarize our results and motivate future work. 2 Representation of Linearization Rules For syntactic representation formalisms there are two basic ways to relate the description of hierarchical structures and the description of linearization rules. First, syntactic rules may describe hierarchical as well as positional facts within the same structure by interpreting mother{daughter relations as hierarchical arrangements and neighborhood relations as positional arrangements. These ` xed structures' are used by, e.g., context free grammars (CFGs) or standard Tree Adjoining Grammars (TAGs). Second, there might be two distinct sets of rules, one merely describing hierarchical mother{daughter relations without interpreting the position of elements, the other describing positional constraints by referring to elements of the hierarchical structures. We will call this approach `H/P{ approach' (Hierarchical/Positional). It is, e.g., realized by ID/LP grammars (cf. e.g. [GKPS85]) or LD/LP TAGs (cf. [Jos87]). The following discussion motivates our decision to follow the H/P paradigm for incremental syntactic generation. 2.1 Generation as Choosing among Alternatives For generation, the possibility to realize a large bandwidth of linguistic phenomena is not the kernel feature. Most important, the generator must be able to choose among alternatives in order to produce exactly one adequate utterance. There are alternatives at several levels of the so{called HOW{to{SAY part of generation, i.e., during microplanning, lexical choice, the choice of hierarchical structures and linearization. Each decision should be in uenced by, e.g., global consistency constraints for the utterance, the relation to the surrounding text or dialogue, the situation of the `speaker', a model of the hearer and time restrictions on processing. Describing the in uence of single criteria on di erent types of decisions is eased by realizing local views on the levels of decision, i.e., by de ning separate sets of rules. The consequences of this statement for the linearization level are described on the basis of the following example. The German linearization rule `pronoun in front of noun' is an important factor for positioning the elements of the 4 `Middle eld'2. Diverging from this rule often means to emphasize the pronoun: ,,Er gibt ihm das Buch." (He gives him the book) ,,Er gibt das Buch ihm." It makes little sense to integrate both linearization rules `pronoun in front of noun' and `noun in front of pronoun' into all relevant xed structures. Besides making the grammar redundant, this integration would make it necessary to associate all these structures with markers which support the choice of an adequate linearization in the situation at hand. It seems more reasonable to evaluate isolated linearization rules. Another problem comes up with the size of xed structures. Possibly, large structures may be described as resulting from the application of various combinations of linearization rules. This (virtual) preprocessing of linearization constraints is disadvantageous for choosing an adequate structure since it is hard to describe the meaning of the single linearization decisions on the basis of their intermixed occurrence in the xed structure. Again, it should be possible to select H{rules and evaluate the isolated linearization rules (P{rules) and weigh out, which one to apply and which ones not to apply. 2.2 Incremental Generation and Incremental Output Production Incremental generation and especially incremental output production imply a mixed processing at the hierarchical and the positional level. In order to utter rst parts of a sentence the elements must have been integrated into the global structure and must have been linearized. Later on, the addition of new input increments causes further combination operations at the hierarchical level and further linearization at the positional level. If the syntactic representation formalism in use does not allow for an H/P distinction, choosing a hierarchical structure means to decide about the ordering of all included elements. Depending on the size of grammar rules this decision a ects several elements. Among them, there may be elements whose positioning cannot be xed before some further input information has been given. For example, a lexicalized structure describing the bitransitive verb \geben" (to give) might include descriptions of the subject, the direct and the indirect object. Supposed subject and verb have already been given in the input, they can only be uttered when the positioning of the objects is xed at the same time. This might lead to contradictions with the input given later on, e.g.: 2The Middle eld is the region between nite and non{ nite verb. 5 ,,Peter gibt obj(dat) obj(acc)." (Peter gives s.o. s.th.) ,,Peter gibt es seinem Freund." (obj(acc) obj(dat)) (Peter gives it to his friend) Here, the chosen rule induced the word order indirect object | direct object, while some time later the realization of the direct object as pronoun makes the inverse ordering more adequate. 2.3 The Relation of Hierarchical and Positional Rules Coming from the discussions of Section 2.1 and 2.2, a grammar formalism for incremental generation must be exible enough to preserve word order variations as long as possible during the generation process. Thereby, it should be easy to handle the pre x of the sentence already uttered as constraining the set of applicable linearization rules. Additionally, the grammar formalism should support the formulation of relations of linearization rules to situational factors in order to direct the choice of adequate rules. The H/P paradigm is one possible means to ful l these requirements. The isolated linearization rules can be evaluated with respect to their adequacy in various situations (see left part of Figure 1). This allows for their meaningful application, i.e., the xing of hierarchical structures according to the situation at hand. Alternatively, a grammar based on xed structures can be used as starting{ point. Here, the size of the structures is relevant for the granularity of steps during incremental processing. The smaller they are, the more e ectively can the combination operations be used to linearize the respective elements. These operations should be guided by linearization constraints (see right part of Figure 1). @@@@ p p p p t1 . . . tn +f linearization constraints upon subsets of ft1 : : : tng @@ Xi ta. . . tb @@ Xj tc. . . td @@ Xk te. . . tf +f linearization constraints upon the combination of Xi{trees Figure 1: Alternative Approaches for Guiding Linearization Figure 2 shows two intuitive examples for the two approaches without referring to any known formalism. In the left part, a hierarchical VP{structure is described as consisting of subject, object and verb. The associated linearization constraints de ne two alternative linearizations of the leaves, leading to Subject{ Verb{Object and Object{Verb{Subject, respectively. In the right part of Fig6 @@@@ SUBJ OBJ V VP +f n (SUBJ < V < OBJ) (OBJ < V < SUBJ) o V P7 @@ V V P6 V P5 @@ OBJV P3V P4 @@ SUBJ V P1V P2 +f 8>>>>><>>>>>: fV P2 + V P5; V P6 + V P3; V P4 + V P7g or fV P4 + V P5; V P6 + V P1; V P2 + V P7g 9>>>>>=>>>>>; Figure 2: Examples for Di erent Approaches to Linearization ure 2, several xed structures describe the elements of a verbal phrase. They can be combined by unifying two V Pi{nodes according to the given linearization constraints that allow for two di erent sets of combinations. Their application leads to two structures, whose leaves are ordered as Subject{Verb{Object and Object{Verb{Subject, respectively. 2.4 Output Revisions Since incremental generation includes the production of incremental output there is the risk of uttering parts of the sentence which are not compatible with some input given later on. For example, ,,Der Junge spielt . . . " (the boy plays) might already have been uttered when a modi er ,,klein" (small) for ,,Junge" is given. Introducing the modi er at this point in time causes a revision at the positional level. At the hierarchical level, the structure is either extended in a normal way or also revised, depending on whether hierarchical and positional constraints are mixed within the grammatical representation. Consequently, the evaluation of the two approaches for this speci c feature of incremental generation depends on comparing the e ciency of the two styles of revision. Our impression is that it is advantageous to withdraw decisions at one level without a ecting the other one, as is possible for the H/P approach: Positional decisions can be withdrawn without rebuilding parts of the syntactic structure. That is why, we decided for the H/P approach and started extending the TAG formalism in a way that supports exible linearization and output repair. 7 3 CDL{TAG TAG with Context{Dependent Disjunctive Linearization Rules (CDL{TAG) is an extension of Tree Adjoining Grammar (cf. [JLT75]) that will be brie y described at this point. 3.1 The Standard TAG Formalism The elementary rules of a TAG are phrase structure trees of two kinds. Initial trees consist of internal nodes3 labeled with nonterminals and of leaves labeled with terminals. They are used to describe complete phrases. Auxiliary trees look quite the same except for one leaf that is labeled with a nonterminal, namely the same name as the root node has. This leaf is called foot node. Auxiliary trees are mostly used to describe recursive or modifying structures. The combination operation of adjunction is de ned as substituting an internal node of an elementary (or derived) tree by an auxiliary tree, whose root and foot node are labeled with the same name as the node of adjunction (see Figure 3). This combination operation makes the grammar mildly context{sensitive and therefore adequate for the representation of natural language (cf. [KJ85]). TTTTTT TTTTTT S S X X !1 !1 !3 !3 AAA AAA !2 AAAA AAAA v1 v1 X X v2 v2 X S;X 2 N;!1; !2; !3; v1; v2 2 T Adjunction !2 Figure 3: Elementary Trees and Adjunction in a TAG The TAG formalism has been extended by a second kind of combination operation which has context{free power and therefore does not increase the generative power of TAGs. TAGs with Substitution (cf. [SAJ88]) de ne elementary trees with another special type of leaves called substitution nodes. The labels of substitution nodes are nonterminals marked with a downward arrow in order to 3The root note which is labeled with S | the start symbol of the grammar| is also regarded as internal node in our terminology. 8 distinguish them from foot nodes (see Figure 4). The set of initial trees contains substitution trees whose root nodes may be labeled with arbitrary nonterminals. A substitution is de ned as the replacement of a substitution node by a substitution tree whose root node has the same label as the substitution node itself. Since a derivation is not completed before all substitution nodes have been replaced by substitution trees, the substitution is an obligatory combination operation for trees. In order to allow compact representations of complex syntactic dependencies we have extended Tree Adjoining Grammar by feature structures. For more details about TAGs with uni cation (UTAGs) see [Kil92]. The H/P{paradigm has been applied to TAG by [Jos87]. He de ned LD/LP{ TAG by \taking the elementary trees as domination structures over which linear precedences can be de ned." The nodes are identi ed by addresses linear precedence statements refer to. By leaving out ordering constraints between nodes, variations of the positioning of elements are de ned. LD/LP{TAGs only partially ful l the requirements upon exible linearization stated above. The descriptive power of LP{rules is not su cient to describe all linearization alternatives of one hierarchical structure (e.g., for German verbal phrases subject{verb{object, object{verb{subject, . . . ). Furthermore, there is no means to associate di erent P{rules with contextual (semantic and pragmatic) constraints. We decided to develop a new extension of TAG on the basis of LD/LP{TAGs that is more suitable for exible linearization. 3.2 CDL{TAG CDL{TAG is de ned according to the H/P paradigm, i.e., domination structures are used as elementary structures instead of trees. The possible orderings of the nodes of local structures and nodes of auxiliary structures that might be inserted by adjunction are restricted by linearization rules which are associated with internal nodes. They have the following form: \(<" (\("context lin-rule* \)")* \)" The rules are initiated by the key \<". Each alternative starts with the name of a context from the set CONTEXT in which it may be used. The special name \any" refers to an arbitrary context, i.e. the rule may be used in any case. Each other name refers to a speci c context that has to be identi ed by the surrounding syntactic structure or to be speci ed by semantic and pragmatic input information. For CDL{TAGs we de ne a special feature `lin-context' whose value is the actual context 2 CONTEXT. It has to be part of the feature structure associated with the respective node (or `any' is used as a default). The value is 9 set when examining the input information for generation and can be inherited via path equations through the feature structures of neighboring nodes. The left part of Figure 4 illustrates a VP{node whose subtree represents a German verbal phrase. Its linearization rules include statements about verb{ rst, verb{ VP V Subj# Accobj# PPPP [lin-context: verb-second] (< (verbrst . . . ) (verb-second . . . ) (verbnal . . . )) NP Speci er# N Modi er# PPPP (< (any . . . ) (short . . . )) Figure 4: Examples for German Linearization Rules second and verb{ nal word order. Its feature structure contains the value `verb{ second' for the attribute `lin{context'. It may be set, e.g., when computing the syntactic realization for the sentence type `declarative'. Other contexts (like `any' or `short' at the NP{node in the gure) distinguish word order rules that di er with respect to their suitability for speci c situational| non{syntactic | factors. With `short' a word order is marked that is useful to save space and time in the nal utterance. In a generation system there may be `generation parameters' set globally for each system's run that are used to evaluate linearization alternatives (see Section 4.3). Each word order rule (lin rule) is encoded as a list that may contain some linearization elements lin el. The order of elements in the list de nes the order of elements of the TAG tree referred to. There are some alternatives for de ning a lin el: A number num is a lin el. Each number num refers to a daughter of the node the linearization rule is associated with (i refers to the ith daughter). (sym1 : : : symn)1 is a lin el, describing exactly one occurrence of one of the elements referred to by symi. Each symbol refers to an optional element 2 OPT that may be incorporated into the phrase. Each auxiliary tree of the grammar also has to be associated with one symbol 2 OPT thereby identifying the modifying structures as elements of linearization rules. The symbols that denote adjuncts must make it possible to distinguish the elements in a way that is detailed enough to express all aspects that in uence word positions. For english, e.g., di erent classes of adverbs have to be de ned expressing their di erent linearization constraints. 10 (sym1 : : : symn)1=0 is a lin el, describing one or zero occurrences of one of the elements referred to by symi. (sym1 : : : symn)+ is a lin el, describing an arbitrary repetition (at least one occurrence) of elements referred to by symi. (sym1 : : : symn) is a lin el, describing an arbitrary repetition of elements referred to by symi. We use regular expressions as a simple means to describe the position of (an arbitrary number of) optional elements that may occur within sentential structures. Of course, this approach does not solve the hard problem of describing all linearization phenomena that include adjuncts. Nevertheless, we use it as a starting point, concentrating on the usefulness of the formalism for incremental processing while keeping in mind its shortcomings at the linguistic level. The following expression is an example for a linearization rule and might be associated with the VP{node of Figure 4: (< . . . (verb-second (2 1 . . . (advp) . . . 3 . . . ) ((advp)1 1 2 . . . (advp) . . . 3 . . . ) . . . ) ) The linearization rule shows two alternatives to ll the rst position of a verbal phrase in the linearization context verb{second, namely a complement (`2' refers to the subject), or exactly one optional element (`advp' refers to an adverbial phrase). After the rst element, the nite part of the in ected verb (referred to by `1' in the linearization rule) has to follow. The second expression prescribes that the subject directly follows the verb in case of a topicalized adverbial phrase. Somewhere in the verbal phrase an arbitrary number of adverbs may be inserted as denoted by `(advp) '. In addition to the context information, the selection of adequate linearization rules may be restricted by information about the subtree to be linearized. CDL{TAGs use child-info that is inherited from the daughters of the node the linearization rule is associated with. The resulting structure for P{rules is \(<" (\("context child-info lin-rule* \)")* \)" By means of the entry child{info a speci c test is realized (identi ed by the key `test') for feature{value{combinations which have to hold for some of the daughters of the actual node. The P{rule 11 (< . . . (short (test (3 (cat) name)) (. . . 3 . . . (adjp) . . . 2 . . . ) . . . ) . . . ) might be associated with the NP{node in the right of Figure 4. It describes a possible linearization of a Speci er{Noun{Modi er construction in German: Instead of \Die Werke Goethes" (the works of Goethe) it is also possible to say \Goethes Werke" (Goethe's works). The presupposition for choosing this `brief'4 linearization alternative is that the modi er is realized as a proper name which is tested by referring to the third daughter of NP (Modi er#) and then checking the equality of feature{value of `cat' and the atomic value `name'. In the following, we use the terms modi er auxiliary tree and predicative auxiliary tree that have been introduced by [SS92]. They distinguish recursive structures that introduce (multiple) modi ers, e.g. adjectives, from those that realize predication relations, e.g., for raising and sentential complements. The symbols used within the linearization elements of our rules refer to modi ers possibly introduced by modi er auxiliary trees. Figure 5 shows the result of adjoining an adverb into the VP{node of Figure 4. The linearization rule that had been associated with the node of adjunction is now associated with the root node of the modi er tree. The path from the root to the foot node of the auxiliary tree is ignored for the purpose of linearization. In this way, an intermixing of the daughters of the node of adjunction and the leaves of the modi er tree can be reached, just as described in the linearization elements. In a German verb{second phrase, e.g., the leaves can be ordered as Subj# V Accobj# Adv (\Ich liebe ihn innig. { I love him sincerely.") or Subj# V Adv Accobj# (\Ich koche schnell eine Suppe. { I quickly cook a soup."). While it is possible to adjoin several modi er trees into `the same' node of adjunction, the sequence of adjunctions stops after introducing one predicative tree. The nodes of predicative trees are not mixed with the daughters of the node of adjunction, i.e., they are not referred to by its linearization rules. Instead, the predicative trees | actually predicative domination structures | are interpreted structurally with respect to their depth. They have to be associated with local linearization rules that constrain the relative position of the foot node, that means | after adjunction | the relative position of the subtree of the node of adjunction. 4The key `short' is meant in the sense of saving space and time in the nal utterance when using this alternative. This may be meaningful under time pressure or when the space allocated for the written text is restricted. 12 VP Adv VP V Subj# Accobj# PPPP PPPP (< (verbrst . . . ) (verb-second . . . ) (verbnal . . . )) Figure 5: Adjunction of a Modi er Auxiliary Tree This speci c interpretation of the e ect of adjunction on P{rules is comparable to the operation of furcation that has been de ned by [DK88] as the uni cation of two root nodes of structures to one root node with two substructures. (Virtually) attening the structures makes it easier to handle the embedding of modifying elements. CDL{TAGs combine exible combination operations at the hierarchical level with a treatment of P{rules that supports incremental linearization. The representation of linearization rules allows for the consideration of situational factors when computing an adequate word order. Their encoding as lists with regular expressions is advantageous for keeping track with the incremental output: The description of already uttered words is stored as a pre x that has to be matched with all P{rules in order to nd possible continuations. 4 The Linearization Component of VM{GEN CDL{TAGs have been developed to improve incremental linearization in VM{ GEN (and its predecessor TAG{GEN), the component for syntactic generation in VERBMOBIL. In the following we show how this extension of the TAG formalism is used in our system. 4.1 An Overview of VM{GEN VM{GEN has been designed as a syntactic generator that realizes full incrementality, i.e. consumes incremental input and produces incremental output in an interleaved fashion. The input is organized in the form of packages either describing lexical items (the results of lexical choice) or specifying semantic relations between those items. The input increments can be handed over in an arbitrary order and with arbitrarily long pauses. VM{GEN tries to integrate all increments that are given during one system's run into one utterance, whose elements 13 are handed over to an articulation component as soon as possible. Incremental output production is de ned as being able to utter parts of a sentence before the input has been completely speci ed, forcing the system to make decisions on the basis of incomplete input. There may be contradictions to input increments that are given later on (remember the example of Section 2.4). A repair component has to guide the hidden or overt repair of the pre x of the utterance already produced. A second kernel feature of VM{GEN is ne{grained parallelism that is realized by a distributed parallel model of active cooperating objects. Input increments for incremental generation have to be de ned in a way that allows for a rather independent computation of the single parts. This presupposition additionally motivates parallelism. For each input increment an object (a process) is created that is responsible for verbalizing the increment in concert with the other objects. The objects form a hierarchy according to the dependency relations that hold between the lexical items they are realizing. The hierarchy is used as a means for global synchronization, e.g., for uttering a consistent sequence of output increments. First of all, each object chooses a syntactic structure that represents the syntactic constraints associated with the input increment. Afterwards, it runs through two di erent phases trying to solve the task of verbalization by exchanging appropriate messages with other objects. The two phases are called hierarchical level and positional level of VM{GEN, respectively. 4.2 The Hierarchical Level of VM{GEN Objects at the hierarchical level try to combine their local structures in order to build the global syntactic tree representing the utterance. In order to save the presupposition for parallelism in processing, the objects are only allowed to exchange copies of relevant parts of their structures so that the global tree is built up virtually. The substitution operation is realized as an exchange of copies of the feature structures associated with the substitution node and the root node of the substitution tree. For adjunction, the feature structures of the node of adjunction and the root node and the foot node of the auxiliary tree are handed over to the communication partner (for more details see [Kil94]). As soon as an object has integrated its local structure into a global syntactic structure, it changes its state again and enters the positional level of VM{GEN. 4.3 The Positional Level of VM{GEN For an incremental syntactic generator the design of the linearization component is a central task. Language-speci c word order rules restrict the possibilities to 14 line up the parts of an utterance under construction. They often forbid simply adding incrementally given elements at the right of the utterance so far. This is an important reason for self{corrections and other kinds of suboptimal results. This observation made us interleave the linearization task of computing an adequate order of groups of words with the controlling task of deciding when to hand over successive parts to the articulator (WHEN{to{SAY). In contrast to the approaches of [NF90] and [De 90], we do not create a new process and new local structures to handle the output of the hierarchical level at the linearization level. Instead, the objects at the hierarchical level merely change their state and use the linearization rules associated with their hierarchical structures to solve their new tasks. The main motivations for this integrated approach are: Using a separate representation, i.e., local structures, during linearization is partially redundant, since a lot of information from the hierarchical structures is also used as basis for linearization. While a structure is linearized, it may be expanded simultaneously at the hierarchical level because of incrementally given fragments in the input to the generator. Thereby, the state of linearization and output might be used to nd out whether the new elements can be integrated into the utterance without leading to an overt repair (see [FS92]). If this is not possible, the system could trigger the choice of another syntactic tree or even request the microplanner to withdraw decisions (see [WAF+93] for an example). As mentioned at the end of the previous subsection, an object starts working at the linearization level after having integrated its syntactic structure into the global syntactic structure. The attachment to a more global structure supplies the object with contextual information that is used to advantage for selecting appropriate linearization rules5. Especially, the linearization of verbal phrases depends on the type of the sentence, e.g., imperative or declarative, that is inherited from another object of the hierarchy. The linearization process is carried out by the objects of VM{GEN in a distributed way. When an object starts linearizing its local structure it rst chooses the set of linearization rules that is suitable with respect to the linearization context stored as value of the feature `lin-context'. If there are several possible keys for linearization rules the most adequate is computed by regarding some global 5Another reason for the attachment is that there are cases where the elements of a phrase are not locally ordered forming one continuous block but are mixed with elements of another phrase, e.g., for scrambling phenomena [BJR91]. These cases are not yet handled by our system but will be realized in an improved version. 15 generation parameters (e.g. time restriction). All keys are valued with respect to their appropriateness for di erent parameter settings. Having found a suitable set of P{rules, the object traverses its local structure, starting with the root node and interpreting the word order rules that are associated with internal nodes to guide the traversal. In order to realize the task of WHEN{to{SAY, the output activities of the objects are synchronized by means of two types of messages. Each object locally decides, whether it provides an element of the utterance for articulation. It may happen that several objects want to hand over elements to the articulator at the same time. In VM{GEN, the consistency of the global state of the output is protected by synchronization. An object sends a message to its mother in the object hierarchy, declaring its readiness for output and expecting to get the permission to become active. At each time of processing, there is exactly one object of the distributed parallel system that is responsible for guiding the output activities. Initially, the highest object in the hierarchy has the license for output. In the course of time, this license is handed over from one object to another in two directions. The object that has the license for output, locally traverses its structure. If there are competing candidates for the next position in the utterance, the object considers the order of messages from its dependent objects that declared their readiness for output. Furthermore, this decision may be in uenced by a speci c `topic'{attribute that can be associated with an input increment and leads to the topicalization of the element referred to. When a terminal node results as ller for the next position, it is in ected and uttered. When an interface node6 is reached, the license for output is handed over to the object that has been engaged in the combination operation | the object that manages the substitution tree or the auxiliary tree. The further traversal is interrupted until this object has nished its contribution to the utterance and has returned the license. Figure 6 shows some objects which are engaged in linearization. The arrows inside of the objects' borders illustrate the local traversal of syntactic structures. Arrows pointing from interface nodes to distinct objects depict the transfer of the license for output. The sequence of incoming and outgoing arrows at the node of adjunction NP{NP illuminates the fact that this node is an anchor for several substructures which may be distributed over distinct objects. The resulting fragment of the utterance consists of the sequence Determiner{Adjective{Noun. An object which has linearized its structure may still receive messages from objects at the hierarchical level which ask to be integrated by adjunction. Adjunction may in uence the process of linearization in two ways. First of all, it 6We call interface nodes substitution nodes, nodes of adjunction, as well as root nodes and foot nodes of auxiliary trees. 16 . . . ?Subj NP# ?? NP NP Spec# N !!!!aaaa Spec Detp# ?@@R . . . 6 * @@@@@@R@ NP NP Adjp# HHH QQQQsQ? . . . 6 B B B B B B B B BM B B B B B B B B BM A A AK A A AK 6 . . . Figure 6: Objects of VM{GEN at the Positional Level leads to the integration of a new element into the phrase that has to be inserted into the sequence of terminals in an adequate way. Furthermore, it may change the feature structures of several connected TAG{trees which may be used for tests during linearization. Actually, adjunction in an object at the Positional Level triggers a test and { if necessary { a recomputation of the local word order. For example, an object managing a noun phrase has already linearized (and uttered) the elements \Der Junge" (the boy) when a new object is created which has to integrate the modi er \klein" (small) into the noun phrase. The word order `Spec# N' has to be recomputed and changed into `Spec# Mod N'. When an object receives a call for adjunction, it might already have handed over some parts of its local phrase to the articulator. In order to cope with this problem, we realized a simple strategy for revision that copes with these cases by repeating the whole pre x of the actual utterance (see [KF95]). 17 5 Summary The linearization component of VM{GEN makes use of CDL{TAG, a new extension of the TAG formalism. CDL{TAGs realize the H/P{paradigm for TAGs thereby managing positional rules in a way that allows for a compact representation of alternatives, the valuation of word order decisions with respect to some generation parameters, and the support of incremental linearization and incremental output production by handling the output pre x as constraint for further possible linearization. The shortcomings of our approach are a somehow adhoc realization of contextual in uence on word order decisions, the reference to elements that do not occur in the local structure (the symis), and a merely vague idea of how to realize scrambling phenomena. We hope that further experience with VM{GEN will help us to identify starting points for the improvement of the formalism and its use.
منابع مشابه
CDL-TAGs: A grammar formalism for flexible and efficient syntactic generation
During the last decade we developed and continuously improved CDL–TAGs, an extension of TAGs for incremental syntactic generation. This paper presents the current state of development and gives details of the definition of context dependent linearization rules.
متن کاملDefault Handling In Incremental Generation
Natural language generation must work with insufficient input. Underspecifications can be caused by shortcomings of the component providing the input or by the preliminary state of incrementally given input. The paper aims to escape from such dead–end situations by making assumptions. We discuss global aspects of default handling. Two problem classes for defaults in the incremental syntactic ge...
متن کاملComputation of the Internet Checksum via Incremental Update
Computation of the Internet Checksum via Incremental Update Status of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited. Abstract This memo describes an updated technique for incremental computation of the standard Internet checksum. It updates the method described in RFC 1141.
متن کاملIncremental Generation for Real {
The acceptance of natural language generation systems strongly depends on their capability to facilitate the exchange of information with human users. Current generation systems consider the innuence of situational factors on the content and the form of the resulting utterances. However, the need to time their processing exibly is usually neglected although temporal factors play a central part ...
متن کاملAn Ultra Light Authentication Protocol Resistant to Passive Attacks under the Gen-2 Specification
Low-cost Radio Frequency Identification (RFID) tags are devices with very limited computational capability, in which only 250-4K logic gates can be devoted to securityrelated tasks. Classical cryptographic primitives such as block ciphers or hash functions are well beyond the computational capabilities of low-cost RFID tags, as ratified by the EPCglobal Class-1 Gen-2 RFID specification. Moreove...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995